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Abstract 

Specification  matching  is  a  way  to  compare  two  software  components.  In  the  context  of  software  reuse  and 
library  retrieval,  it  can  help  determine  whether  one  component  can  be  substituted  for  another  or  how  one 
can  be  modified  to  fit  the  requirements  of  the  other.  In  the  context  of  object-oriented  programming,  it  can 
help  determine  when  one  type  is  a  behavioral  subtype  of  another.  In  the  context  of  system  interoperability, 
it  can  help  determine  whether  the  interfaces  of  two  components  mismatch.  We  use  formal  specifications 
to  describe  the  behavior  of  software  components,  and  hence,  to  determine  whether  two  components  match. 
We  give  precise  definitions  of  not  just  exact  match,  but  more  relevantly,  various  flavors  of  relaxed  match. 
These  definitions  capture  the  notions  of  generalization,  specialization,  substitutability,  subtyping,  and  inter¬ 
operability  of  software  components.  We  write  our  formal  specifications  of  components  in  terms  of  pre-  and 
post-condition  predicates.  Thus,  we  rely  on  theorem  proving  to  determine  match  and  mismatch.  We  give 
examples  from  our  implementation  of  specification  matching  using  the  Larch  Prover. 
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1.  Motivation  and  Introduction 


Specification  matching  is  a  process  of  determining  if  two  software  components  are  related.  It  underlies 
understanding  this  seemingly  diverse  set  of  questions: 

•  Retrieval  How  can  I  retrieve  a  component  from  a  software  library  based  on  its  semantics,  rather  than 
syntactic  structure? 

•  Reuse,  How  might  I  adapt  a  component  from  a  software  library  to  fit  the  needs  of  a  given  subsystem? 

•  Subsiiiuiion.  When  can  I  replace  one  software  component  with  another  without  affecting  the  observable 
behavior  of  the  entire  system? 

•  Subtype.  When  is  an  object  of  one  type  a  subtype  of  another? 

•  Interoperation.  Why  is  it  so  difficult  to  make  two  independently  developed  components  work  together? 

In  retrieval,  we  search  for  all  library  components  that  satisfy  a  given  query.  In  reuse,  we  adapt  a 
component  to  fit  its  environmental  constraints,  based  on  how  well  the  component  meets  our  requirements. 
In  substitution,  we  expect  the  behavior  of  one  component  to  be  observably  equivalent  to  the  other’s;  a 
special  case  of  substitution  is  when  a  subtype  object  is  the  component  substituting  for  the  supertype  object. 
In  interoperation,  we  want  one  component  to  interact  properly  with  the  other.  Common  to  answering 
these  questions  is  deciding  when  one  component  matches  another,  where  “matches”  generically  stands  for 
“satisfies,”  “meets,”  “is  equivalent  to,”  or  “interacts  properly  with.”  Common  to  these  kinds  of  matches  is 
the  need  to  characterize  the  dynamic  behavior,  i.e.,  semantics,  of  each  software  component. 

It  is  rarely  the  case  that  we  would  want  one  component  to  match  the  other  “exactly.”  In  retrieval, 
we  want  a  close  match;  as  in  any  information  retrieval  context  [Cor,  ML94,  SM83],  we  might  be  willing  to 
sacrifice  precision  for  recall.  That  is,  we  would  be  willing  to  get  some  false  positives  as  long  as  we  do  not  miss 
any  (or  too  many)  true  positives.  In  determining  substitutability,  we  do  not  need  the  substituting  component 
to  have  the  exact  same  behavior  as  the  substituted,  only  the  same  behavior  relative  to  the  environment  that 
contains  it. 

In  this  paper  we  lay  down  a  foundation  for  different  kinds  of  semantic  matches.  We  explore  not  just  exact 
match  between  components,  but  many  flavors  of  relaxed  match.  To  be  concrete  and  to  narrow  the  focus  of 
what  match  could  mean,  we  make  the  following  assumptions: 

•  The  software  components  in  which  we  are  interested  sme  functions  (e.g.,  C  routines,  Ada  procedures,  ML 
functions)  and  modules  (roughly  speaking,  sets  of  functions)  written  in  some  programming  language. 
These  components  might  typically  be  stored  in  a  program  library,  shared  directory  of  files,  or  software 
repository. 

•  Associated  with  each  component,  C,  is  a  signature,  Csig,  and  a  specification  of  its  behavior,  Cspec- 

Whereas  signatures  describe  a  component’s  type  information  (which  is  usually  statically-checkable),  specifi¬ 
cations  describe  the  component’s  dynamic  behavior.  Specifications  more  precisely  characterize  the  semantics 
of  a  component  than  just  its  signature.  In  this  paper,  our  specifications  are  formal,  i.e.,  written  in  a  formally 
defined  assertion  language. 
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Given  two  components,  C  =  {Csig,Cspec)  and  C  =  ,  C'pee) .  we  define  a  generic  component  match 

predicate,  Match: 

Definition:  [Component  Match) 


Match:  Component  ^  Component  ^  Bool  (1) 

Match[CyC^)  —  'f^o>tch  gig[Csig  ^  C  g<ig)  A  match^p^^^Csp  ec  T 

Two  components  C  and  C  match  if  1)  their  signatures  match,  given  some  definition  of  signature  matching 
(matchsig),  and  2)  their  specifications  match,  given  some  definition  of  specification  match  {matchspec)- 
Although  we  define  match  as  a  conjunction,  we  can  think  of  signature  match  as  a  “filter”  that  eliminates 
the  obvious  non-matches  before  trying  the  more  expensive  specification  match. 

There  are  many  possible  definitions  for  the  signature  match  predicate,  matchsig,  which  we  thoroughly 
analyzed  in  a  previous  paper  [ZW95].  In  the  remainder  of  this  paper,  for  matchsig,  we  use  for  functions 
type  equivalence  modulo  variable  renaming  (“exact  match”  in  [ZW95]),  and  for  modules,  a  partial  mapping 
of  functions  in  the  modules  with  exact  signature  match  on  the  functions  (“generalized  module  match” 
in  [ZW95]). 

In  this  paper,  we  focus  on  the  specification  match  predicate,  matchspec-  We  write  pre-/post-condition 
specifications  for  each  function,  where  assertions  are  expressed  in  a  first-order  predicate  logic.  Match  between 
two  functions  is  then  determined  by  some  logical  relationship,  e.g.,  implication,  between  the  two  pre-/post- 
conditions  specifications.  We  can  then  modularly^  define  match  between  two  modules  in  terms  of  some  kind 
of  match  between  corresponding  functions  in  the  modules.  Given  our  choice  of  formal  specifications,  we  can 
exploit  state-of-the-art  theorem  proving  technology  as  a  way  to  implement  a  specification  match  engine. 

Specification  match  goes  a  step  beyond  signature  match.  For  functions,  signature  match  is  based  entirely 
on  the  functions’  types,  e.g.,  int  *  int  int,  and  not  at  alt  on  their  behavior.  For  example,  integer  addition 
and  subtraction  both  have  the  same  signature,  but  completely  opposite  behavior;  the  C  library  routines 
strcpy  and  strcat  have  the  same  signature  but  users  would  be  unhappy  if  one  were  substituted  for  the  other. 
Given  a  large  software  library  or  a  large  software  system,  many  functions  will  have  identical  signatures  but 
very  different  behavior.  For  example,  in  the  C  math  library  nearly  two-thirds  of  the  functions  (31  out  of  47) 
have  signature  double  — >■  double.  Based  on  signature  match  alone,  we  cannot  know  if  we  are  interoperating 
with  a  function  properly  or  know  which  of  a  large  number  of  retrieved  functions  does  what  we  want.  Since 
specification  match  takes  into  consideration  more  knowledge  about  the  components  it  allows  us  to  increase 
the  precision  with  which  we  determine  when  two  components  match. 

In  what  follows,  we  first  briefly  describe  the  language  with  which  we  write  our  formal  specifications. 
We  define  exact  and  relaxed  match  for  functions  (Section  3)  and  then  for  modules  (Section  4).  We  discuss 
in  more  detail  applications  of  specification  match  in  the  software  engineering  context  in  Section  5  and  our 
implementation  of  a  specification  matcher  using  the  Larch  Prover  in  Section  6.  We  close  with  related  work 
and  a  summary. 


2.  Larch/ ML  Specifications 


We  use  Larch/ML  [WRZ93],  a  Larch  interface  language  for  the  ML  programming  language,  to  specify  ML 
functions  and  ML  modules.  Larch  provides  a  “two-tiered”  approach  to  specification  [GH93].  In  one  tier, 
the  specifier  writes  traits  in  the  Larch  Shared  Language  (LSL)  to  assert  state-independent  properties  of 
a  program.  Each  trait  introduces  sorts  and  operators  and  defines  equality  between  terms  composed  of 


^  Pun  intended. 
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the  operators  (and  variables  of  the  appropriate  sorts).  Appendix  A  shows  the  Sequence  trait,  which  defines 
operators  to  generate  sequences  {empty  and  insert),  to  return  the  element  or  sequence  resulting  from  deleting 
an  element  from  the  beginning  (or  end)  {first  {last)  and  butFirst  {butLast)),  and  to  return  the  length  of  a 
sequence  {length)  or  whether  a  sequence  is  empty  {isEmpty). 

In  the  second  tier,  the  specifier  writes  interfaces  in  a  Larch  interface  language  to  describe  state-dependent 
effects  of  a  program  (see  Figure  1).  The  Larch/ML  interface  language  extends  ML  by  adding  specification 
information  in  special  comments  delimited  by  (*  +  ...  +  *).  The  using  and  based  on  clauses  link  interfaces 
to  LSL  traits  by  specifying  a  correspondence  between  (programming-language  specific)  types  and  LSL  sorts. 
The  specification  for  each  function  begins  with  a  call  pattern  consisting  of  the  function  name  followed  by  a 
pattern  for  each  parameter,  optionally  followed  by  an  equal  sign  (=)  and  a  pattern  for  the  result.  In  ML, 
patterns  are  used  in  binding  constructs  to  associate  names  to  parts  of  values  {e.g.,  (x,  y)  names  x  as  the  first 
of  a  pair  and  y  as  the  second).  The  requires  clause  specifies  the  function’s  pre-condition  as  a  predicate  in 
terms  of  trait  operators  and  names  introduced  by  the  call  pattern.  Similarly,  the  ensures  clause  specifies  the 
function’s  post-condition.  If  a  function  does  not  have  an  explicit  requires  clause,  the  default  is  requires 
true. 


signature  Stack  —  sig 

(*-|-  using  Sequence  -{-*) 
type  a  t 

(*+  based  on  Sequence. E  Sequence. S  +*) 

val  create  :  unit  a  t 
(*+  create  {)  =  s 
ensures  s  =  empty  -j-*) 

val  push  :  a  t  ^  a  a  t 
(*-l-  push  {s,  e)  =  s2 
ensures  s2  =  insert {e,s)  +*) 

val  pop  :  at-^at 
(*-l-  pop  s  =  s2 
requires  not  {isEmpty  {s)) 
ensures  s2  =  butFirst  {s)  -f*) 

val  top  :  a  t  a 
(*-|-  top  s  =  e 
requires  not  {isEmpty  {s)) 
ensures  e  =  first  {s)  +*) 

end 


signature  Queue  =  sig 

(*+  using  Sequence  -f*) 
type  a  i 

(*+  based  on  Sequence. E  Sequence. S  -b*) 

val  create  :  unit  a  t 
(*-l-  create  {)  =  q 
ensures  q  =  empty  H-*) 

val  enq  :  a  t  ^  a  a  t 
(*-f  enq{q,e)  =  q2 
ensures  q2  =  insert  (e,  q )  -h*) 

val  rest  :  a  t  ^  a  t 
(*+  rest  q  ~  q2 
requires  not  {isEmpty  {q)) 
ensures  q2  =  butLast{q)  -|-*) 

val  deq  :  a  t  a 
(*-h  deq  q  =  e 
requires  not  {isEmpty  {q)) 
ensures  e  =  last{q)  -h*) 

end 


Figure  1:  Two  Larch/ML  Specifications 

We  will  use  the  Larch/ML  interface  specifications  of  Figure  1  as  the  “library”  for  our  examples  of 
specification  matching.  It  contains  module  specifications  for  Stack  and  Queue,  specifying  the  functions 
create,  push,  pop,  and  top  on  stacks,  and  create,  enq,  deq,  and  rest  on  queues.  We  specify  each  function’s 
pre-/post-conditions  in  terms  of  operators  from  the  Sequence  trait. 
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3. 


Function  Matching 


For  a  function  specification,  S,  we  denote  the  pre-  and  post-condition  as  Spre  and  Spo$t,  respectively.  Spred 
defines  the  interpretation  of  the  function’s  specification  as  an  implication  between  the  two:  Spred  =  Spre  => 
Spost-  Intuitively,  this  interpretation  means  that  if  Spre  holds  when  the  function  specified  by  S  is  called, 
Spost  will  hold  after  the  function  has  executed  (assuming  the  function  terminates).  If  Spre  does  not  hold, 
there  are  no  guarantees  about  the  behavior  of  the  function.  This  interpretation  of  a  pre-  and  post-condition 
specification  is  the  most  common  and  natural  for  functions  in  the  standard  programming  model. 

For  example,  for  the  Stack  top  function  in  Figure  1,  the  pre-condition,  toppre,  is  noi {is Empty  (s));  the 
post-condition,  ioppost^  is  e  —  first{s);  and  the  specification  predicate,  ioppred^  is  {noi{isEmpiy{s)))  (e  = 
first  {s)). 

To  be  consistent  in  terminology  with  our  signature  matching  work,  we  present  function  specification 
matching  in  the  context  of  a  retrieval  application.  Example  matches  are  between  a  library  specification  S 
and  a  query  specification  Q.  We  assume  that  variables  in  S  and  Q  have  been  renamed  consistently^.  For 
example,  if  we  compare  the  Stack  pop  function  with  the  Queue  rest  function,  we  must  rename  g  to  s  and  q2 
to  s2.  In  this  section  we  examine  several  definitions  of  the  specification  match  predicate  {match spec{SjQ))- 
We  characterize  definitions  as  either  grouping  pre-conditions  Spre  and  Qpre  together  and  post-conditions 
Spost  and  Qpost  together,  or  relating  predicates  Spred  and  Qpred^  Both  of  these  kinds  of  matches  have  a 
general  form. 

Definition:  {Generic  Pre/Post  Match) 


match prej post{S j  Q)  —  {Qpre  "^1  Spre  )  ^2  {Spost  E'Z  Qpost) 


(2) 


Pre/post  matches  relate  the  pre-conditions  of  each  component  and  the  post-conditions  of  each  component. 
The  relations  Hi  and  Hs  are  either  equivalence  or  implication  (=^),  but  need  not  be  the  same.  H2  is 
usually  conjunction  (A)  but  may  also  be  implication  (=^).  The  matches  may  vary  from  this  form  by  dropping 
some  of  the  terms. 

Definition:  {Generic  Predicate  Match) 


match pred{S )  Q)  —  Spred  ^  Q pred 


(3) 


Predicate  matches  relate  the  entire  specification  predicates  of  the  two  components,  Spred  and  Qpred-  The 
relation  1Z  is  either  equivalence  (<^),  implication  (=>),  or  reverse  implication  {<=). 

It  is  important  to  look  at  both  kinds  of  match.  Which  kind  of  match  is  appropriate  may  depend  on  the 
context  in  which  the  match  is  being  used  or  on  the  specifications  being  compared.  We  present  the  pre/post 
matches  in  Section  3.1  and  the  predicate  matches  in  Section  3.2.  For  each,  we  present  a  notion  of  exact 
match  as  well  as  relaxed  matches. 


3.1.  Pre/Post  Matches 

Pre/post  matches  on  specifications  S  and  Q  relate  Spre  lo  Qpre  ^-nd  Spost  to  Qpost'  We  consider  four  kinds 
of  pre/post  matches,  beginning  with  the  strongest  match  and  progressively  weakening  the  match  by  either 
relaxing  the  relations  and  7^3  from  ^  to  relaxing  7^2  from  A  to  or  dropping  one  or  more  terms. 


^This  renaming  is  easily  provided  by  the  signature  matcher, 


and  we  are  assuming  that  the  signatures  of  S  and  Q  match. 
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Exact  Pre/Post  Match 

We  begin  by  instantiating  both  Hi  and  to  <=>  and  II2  to  A  in  the  generic  pre/post  match  of  Definition 
2.  Two  function  specifications  satisfy  the  exact  pre/post  match  if  their  pre-conditions  are  equivalent  and 
their  post-conditions  are  equivalent. 

Definition:  (Exact  Pre/Post  Match) 

match^ ^ prej post(S y  Q)  —  (Qpre  Spre)  A  (Spost  Qpost) 

Exact  pre/post  match  is  a  strict  relation,  yet  two  different-looking  specifications  can  still  satisfy  the  match. 
Consider  for  example  the  following  query  Ql,  based  on  the  Sequence  trait.  Q1  specifies  a  function  that 
returns  a  sequence  whose  size  is  0,  one  way  of  specifying  a  function  to  create  a  new  sequence. 

signature  Q1  =  sig  (Ql) 

(*-f  using  Sequence  -h*) 

type  a  t  (*-f  based  on  Sequence. E  Sequence. S  -f*) 
val  qCreate  :  unit  ^  at 
(*-f  qCreate  ()  =  s 
ensures  length  (s)  =  0  +*) 

end 

Exact  pre/post  match  holds  for  Q1  with  both  the  Stack  and  Queue  create  functions  of  Figure  1.  (The 
specifications  of  Stack  and  Queue  create  are  identical  except  for  the  name  of  the  return  value.) 

Let  us  look  in  more  detail  at  how  Ql  would  match  the  Stack  create  specification.  Let  S  be  the  specification 
for  Stack  create  and  Ql  be  the  query  specification.  Spre  =  true,  Spost  =  (s  =  empty).  Qlpre  = 
Qlpost  =  {length(s)  =  0).  Since  both  Spre  and  Qlpre  are  true,  showing  matchE-pre/post(SyQl)  reduces  to 
proving  Spost  <=>  Q^posty  or  (s  =  empty)  O  {length(s)  =  0).  The  ‘dP  case  {{s  =  empty)  =>  {length(s)  = 
0))  follows  immediately  from  the  axioms  in  the  Sequence  trait  about  length.  Proving  the  “only-if”  case 
((length(s)  =  0)  ^  (s  =  empty))  requires  only  basic  knowledge  about  integers  and  the  fact  that  for  any 
sequence,  s,  length(s)  >  0,  which  is  provable  from  the  Sequence  trait. 

Plug-in  Match 

Equivalence  is  a  strong  requirement.  For  plug-in  match,  we  relax  both  IZi  and  TZ^  to  =>  and  keep  72-2  as 
A  in  the  generic  pre/post  match.  Under  plug-in  match,  Q  matches  any  specification  S  whose  pre-condition 
is  weaker  (to  allow  at  least  all  the  conditions  that  Q  allows)  and  whose  post-condition  is  stronger  (to  provide 
a  guarantee  at  least  as  strong  as  Q). 

Definition:  (Plug-in  Match) 

matchpiug-in(S,  Q)  —  (Qpre  ^  Spre)  A  (Spost  ^  Qpost) 

Plug-in  match  captures  the  notion  of  being  able  to  “plug-in”  S  for  Q,  as  illustrated  in  Figure  2.  A  specifier 
writes  a  query  Q  saying  essentially: 

I  need  a  function  such  that  if  Qpre  bolds  before  the  function  executes,  then  Qpost  holds  after 
it  executes  (assuming  the  function  terminates). 

With  plug-in  match,  if  Qpre  holds  (the  assumption  made  by  the  specifier)  then  Spre  holds  (because  of  the 
first  conjunct  of  plug-in  match).  Since  we  interpret  S  to  guarantee  that  Spre  ^  Spost y  we  can  assume  that 
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<Qpre> 


<Qpost> 
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■ 

code 


Figure  2:  Idea  Behind  Plug-in  Match 

Spost  will  hold  after  executing  the  plugged-in  5.  Finally,  since  Spost  ^  Qpost  from  the  second  conjunct  of 
plug-in  match,  we  are  assured  of  the  guarantee  the  specifier  desired. 

For  example,  consider  the  following  query  for  an  insert  function: 


signature  Q2  —  sig  (Q2) 

(*+  using  Sequence  +*) 

type  a  t  (*+  based  on  Sequence. E  Sequence. S  +*) 
val  qEnq  :  a  t  *  a  a  t 
(*+  qEnq  (ql^e)  =  q2 
requires  lengih{ql)  <  50 
ensures  length  {q2)  =  {lengtk{ql)  +1)  +*) 

end 


This  query  specification  requires  that  an  input  sequence  has  fewer  than  50  elements,  and  guarantees  that 
the  resulting  sequence  is  one  element  longer  than  the  input  sequence.  This  is  a  fairly  weak  specification.  Q2 
does  not  satisfy  exact  pre/post  match  with  any  function  in  the  library,  but  plug-in  match  holds  for  Q2  with 
both  the  Stack  push  and  the  Queue  enq  functions.  Since  push  and  enq  are  identical  except  for  their  names 
and  the  names  of  the  variables,  the  proof  of  the  match  is  the  same  for  both. 

The  pre-condition  requirement,  Qpre  ^  Sprei  holds,  since  Spre  ~  true.  To  show  that  Spost  ^  Qpostj  we 
assume  Spost  (g2  =  inseri{e,q)),  and  try  to  show  Qpost  {length{q2)  =  length{q)  +  1).  Substituting  for  q2  in 
Qpost,  we  have  lengih{inseri{e,  q))  =  length{q)  +  1,  which  follows  immediately  from  the  equations  for  length. 

Plug-in  Post  Match 

Often  we  are  concerned  with  only  the  effects  of  functions,  thus  a  useful  relaxation  of  the  plug-in  match  is 
to  consider  only  the  post-condition  part  of  the  conjunction.  Most  pre-conditions  could  be  satisfied  by  adding 
an  additional  check  before  calling  the  function.  Plug-in  post  match  is  also  an  instance  of  generic  pre/post 
match,  with  IZ3  instantiated  to  ^  but  dropping  Qpre  Spre- 
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Definition:  (Plug-in  Post  Match  ) 


TTiatchpJug—ifi  —  post(^ )  —  (Spost  Qpost^ 


Consider  the  following  query.  Q3  is  identical  to  Stack  top  except  that  QS  has  no  requires  clause. 


signature  Q3  =  sig  (Q^) 

(*+  using  Sequence  -{-*) 

type  a  t  (*4-  based  on  Sequence. E  Sequence. S  +*) 
val  qTop  :  a  t  ^  a 
(*+  qTop  s  ~  e 
ensures  e  =  first  (s)  +*) 

end 


(33  does  not  satisfy  exact  pre/post  or  plug-in  match  with  Stack  top  since  (33 ’s  pre-condition  is  weaker 
than  Stack  top's.  Since  the  post-conditions  are  equivalent,  (33  does  satisfy  plug-in  post  match  with  Stack 
top. 

Weak  Post  Match 

Finally,  consider  this  even  weaker  match,  weak  post  match.  We  instantiate  IZ3  to  as  with  the  plug-in 
matches,  but  relax  IZ2  to  and  drop  Qpre- 

Definition:  ( Weak  Post  Match  ) 


match <weak~ post(S jQ)  —  Spre  ^  (Spost  ^  Qpost) 


A  more  intuitive,  equivalent,  predicate  is  (Spre  A  Spost)  Qpost^  Sometimes  assuming  the  pre-condition  of 
S  helps  in  proving  the  relationship  between  Spo$t  Qpost-  We  use  Spre  iiot  Qpre  since  Spre  is  likely  to 
be  necessary  to  limit  the  conditions  under  which  we  try  to  prove  Spost  ^  Qpost-  The  additional  assumption 
also  means  that  we  will  have  to  provide  an  additional  ^“^wrapper”  in  our  code  to  guarantee  Spre  before  we 
call  the  function  specified  by  S. 

For  example,  suppose  we  wish  to  find  a  function  to  delete  from  a  sequence  using  the  following  query  (34. 

signature  Q4  =  sig  (Q^) 

(*-|-  using  Sequence  -f*) 

type  a  t  (*-|-  based  on  Sequence. E  Sequence. S  +*) 
val  qRest  :  a  t  a  t 
(*-}-  qRest  s  —  s2 

ensures  length(s2)  —  (length(s)  —1)  -}-*) 

end 

QA  describes  a  function  that  returns  a  sequence  whose  size  is  one  less  than  the  size  of  the  input  sequence. 
This  is  a  fairly  weak  way  of  describing  deletion,  since  it  does  not  specify  which  element  is  removed.  While 
intuitively,  it  would  seem  related  to  Stack  pop^  neither  plug-in  nor  plug-in  post  match  holds,  because  we 
cannot  prove  ^  Qpost  (^2  =  hutFiTst(s))  ^  (length(s2)  =  length(s)  -  1))  for  the  case  where 


^But  it  still  gives  us  a  big  gain  in  precision  over  signature  matching;  Q4  would  not  match  other  functions  with  the  signature 
a  t  a  t,  for  example,  a  function  that  reverses  or  sorts  the  elements  in  the  sequence,  or  removes  duplicates. 
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s  —  empty.  By  adding  the  assumption  Spre  {n oi{is Empty {s))) y  we  are  able  to  complete  the  proof,  as  we  see 
in  the  following  proof  sketch. 


Assume  not(isEmpty{s))  Assume  Spre  (1) 

Assume  ^2  =  butFirst{s)  Assume  Spost  (2) 

length{s2)  =  length{s)  -  1  Attempt  to  prove  Qpost  (3) 

length{butFirst{s))  =  length{s)  —  1  Apply  (2)  to  (3)  (4) 

Let  5  =  insert{eCy  sc)  Since  s  is  not  empty  (1),  and 

s  generated  by  empty  and  insert  (5) 
lengtk{butFirst(insert{eCy  sc)))  =  length{insert{eCy  sc))  -  1  Substitute  (5)  for  s  in  (4)  (6) 

length{sc)  =  length{insert{eCy  sc))  -  1  Axioms  for  butFirsi  (7) 

lengthlsc)  =  {length{sc)  +  1)  -  1  Axioms  for  length  (8) 

length{sc)  =  length(sc)  Axioms  for  +,  —  (9) 


3.2.  Predicate  Matches 

Recall  the  generic  predicate  match  (Definition  3): 

mdtch pred(^^ i  Q)  —  Spred  7^  Qpred 

where  the  relation  U  is  either  equivalence  (<^),  implication  (=>),  or  reverse  implication  {<=), 

Note  that  this  general  form  allows  alternative  definitions  of  the  specification  predicates.  One  alternative 
is  Spred  =  Spre  ^  Spost,  which  is  stronger  than  Spred  =  Spre  =>  Spost-  This  interpretation  is  reasonable  in  the 
context  of  state  machines,  where  the  pre-condition  serves  as  a  guard  so  that  a  state  transition  occurs  only 
if  the  pre-condition  holds. 

As  we  did  with  the  generic  pre/post  match,  we  consider  instantiations  of  the  generic  predicate  match 
beginning  with  the  strictest. 

Exact  Predicate  Match 

We  begin  with  exact  predicate  match.  Two  function  specifications  match  exactly  if  their  predicates  are 
logically  equivalent  (i.e.,  U  is  instantiated  to  ^).  This  is  less  strict  than  exact  pre/post  match  (Section  3.1), 
since  there  can  be  some  interaction  between  the  pre-  and  post-conditions. 

Definition:  [Exact  Predicate  Match) 

match^ — pred[S y  Q)  —  Spred  Qpred 

Our  example  Q1  still  matches  with  Stack  and  Queue  create.  In  fact,  in  cases  where  Spre  =  Qpre  =  Irue^  the 
exact  pre/post  and  exact  predicate  matches  are  equivalent. 

Generalized  Match 

For  generalized  match,  we  relax  IZ  in  the  generic  predicate  match  to  Generalized  match  is  an  intuitive 
match  in  the  context  of  queries  and  libraries:  specifications  of  library  functions  will  be  detailed,  describing 
the  behavior  of  the  functions  completely,  but  we  would  like  to  be  able  to  write  simple  queries  that  focus  only 
on  the  aspect  of  the  behavior  that  we  are  most  interested  in  or  that  we  think  is  most  likely  to  differentiate 
among  functions  in  the  library.  Generalized  match  allows  the  library  specification  to  be  stronger  (more 
general)  than  the  query.  Note  that  generalized  match  is  a  weaker  match  than  plug-in  match.  Also,  if  we 
drop  the  pre-conditions  in  generalized  match,  we  get  plug-in  post  match. 
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Definition:  [Generalized  Match) 


maichgeniS,  Q) 


Spred  Qpred 


For  example,  consider  the  following  query,  which  is  the  same  as  QA  but  with  a  requires  clause. 


signature  Q5  =  sig  (Q^) 

(*+  using  Sequence  +*) 

type  a  t  (*-|-  based  on  Sequence. E  Sequence. S  +*) 
val  qResi  :  a  t  ^  a  t 
(*+  qResi  s  =  s2 
requires  not  [is Empty  [s)) 
ensures  length  [s2)  =  [lengih[s)  —1)  +*) 

end 


Using  the  exact  predicate  match,  neither  the  Stack  pop  nor  the  Queue  rest  specifications  satisfy  this 
query.  Plug-in  match  does  not  work  either  because  we  need  to  assume  Qpre  [not [is Empty [s)))  to  show 
Spost  =>  Qpost’  However,  the  generalized  match  with  Q5  does  hold  for  both  of  these.  The  proof  is  very 
similar  to  that  for  QA  in  the  weak  post  match. 

Consider  another  example  specifying  a  function  that  removes  the  most  recently  inserted  element  of  a 
sequence.  This  query  does  not  require  that  the  specifier  knows  the  axiomatization  of  sequences,  since  the 
query  uses  only  the  sequence  constructor,  insert.  The  post-condition  specifies  that  the  input  sequence,  s,  is 
the  result  of  inserting  an  element  ee  into  another  sequence  S5,  and  that  the  element  returned,  e,  is  the  most 
recently  inserted  element  (ee).  The  existential  quantifier  (there  exists)  is  a  way  of  being  able  to  name  ee 
and  ss. 


signature  Q6  =  sig  (Q6) 

(*-l-  using  Sequence  +*) 

type  a  t  (*+  based  on  Sequence. E  Sequence. S  -f  *) 
val  qTop  :  a  t  ^  a 
(*+  qTop  s  =  e 
requires  not  [is Empty  (s)) 

ensures  there  exists  ee: Sequence. E,  ssiSequence.S 
((s  =  insert[ee^ss))  and  (e  =  ee))  +*) 

end 


Again,  neither  the  exact  nor  plug-in  matches  holds.  Generalized  match  holds  for  the  query  with  the 
Stack  top  function,  but  not  Queue  deq,  since  the  query  specifies  that  the  most  recently  inserted  element  is 
returned.  To  show  the  generalized  match,  we  consider  two  cases:  5  ^empiy,  and  s  z=:inseri(ec,sc).  In  the 
first  case,  the  pre-condition  for  both  top  and  qTop  are  false,  and  thus  the  match  predicate  is  vacuously  true. 
In  the  second  case,  the  pre-conditions  are  both  true,  and  so  we  need  to  prove  that  Spost  Qpost'  If 
instantiate  ee  to  ec  and  ss  to  sc,  the  proof  goes  through. 

Specialized  Match 

For  specialized  match,  we  instantiate  11  in  the  generic  predicate  match  to  Specialized  match  is  the 
converse  of  generalized  match:  match spci{S,Q)  =  matchgen{Q,S).  A  function  whose  specification  is  weaker 
than  the  query  might  still  be  of  interest  as  a  base  from  which  to  implement  the  desired  function.  Specialized 
match  allows  the  library  specification  to  be  weaker  than  the  query. 
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Definition:  {Specialized  Match) 


T7iatchspcl{S^  Q)  —  Qpred  ^  ^pred 

Consider  again  the  query  Q3,  which  is  the  same  as  Stack  top  but  without  the  pre-condition.  Stack  top  is 
thus  weaker  than  QS,  but  we  can  show  that  QZ  implies  Stack  top  and  hence  specialized  match  holds. 


3.3.  Relating  the  Function  Matches 


Exact  Pre/Post 


Exact  Predicate 


Plug-in 


Plug-in  Post 


I  Weak  Post 

True 


Name  of  match 

predicate  symbol 

kind  of  match 

Exact  Pre/Post 

TTiatch  —pj-Q  jpost 

pre/post 

Plug-in  Match 

Tfiatchpiug —in 

pre/post 

Plug-in  Post  Match 

Tfiatchpimg  —  in^posi 

pre/post 

Weak  Post  Match 

Tfiatchyj  — post 

pre/post 

Exact  Predicate  Match 

TTiatch  E—pred 

predicate 

Generalized 

TTiatch  g  Qn 

predicate 

Specialized 

TYt  atchgp^i 

predicate 

Figure  3:  Lattice  of  Main  Function  Specification  Matches 

We  relate  all  our  function  specification  match  definitions  in  a  lattice  (Figure  3).  An  arrow  from  a  match 
Ml  to  another  match  M2  indicates  that  Ml  is  stronger  than  M2  (Ml  =>  M2).  We  also  say  that  M2  is 
more  relaxed  than  Ml.  The  rightmost  path  in  the  lattice  shows  the  pre/post  matches;  the  remainder  of  the 
matches  are  predicate  matches. 
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The  chart  in  Figure  3  summarizes  the  matches  we  have  presented  in  this  section,  along  with  their  predicate 
symbols  and  whether  the  match  is  an  instance  of  the  generic  pre/post  match  or  the  generic  predicate  match. 


Query 

match  F—prej  post 

matchping —in 

TTiatch  gen 

match<iij  Q(ik-~post 

Qi 

Queue  create 
Stack  create 

Queue  create 
Stack  create 

Queue  create 
Stack  create 

Queue  create 
Stack  create 

Q2 

— 

Queue  enq 
Stack  push 

Queue  enq 
Stack  push 

Queue  enq 
Stack  push 

Q3 

— 

— 

— 

Stack  pop 

Q4 

— 

— 

— 

Stack  top 

Q5 

— 

— 

Queue  rest 
Stack  pop 

Queue  rest 
Stack  pop 

Q6 

— 

— 

Stack  top 

Stack  top 

Table  1:  Which  Ones  Match  What 


Table  1  summarizes  which  of  the  library  functions  match  each  of  the  example  queries  for  four  of  the 
matches  we  have  defined  (Exact  Pre/Post,  Plug-in,  Generalized,  Weak  Post). 


4.  Module  Matching 


Function  matching  addresses  the  problem  of  matching  particular  functions.  However,  a  programmer  may 
need  to  compare  collections  of  functions,  e.g,.  ones  that  provide  a  set  of  operations  on  an  abstract  data 
type.  Most  modern  programming  language  explicitly  support  the  definition  of  abstract  data  types  through  a 
separate  modules  facility,  e.g.,  Ada  packages,  or  C++  classes.  Modules  are  also  often  used  just  to  group  a  set 
of  related  functions,  like  I/O  routines.  This  section  addresses  the  problem  of  matching  module  specifications. 

A  specification  of  a  module  is  an  interface,  X  -  {It^Xf),  where  Xt  is  a  multiset  of  user-defined  types 
and  Xf  is  a  multiset  of  function  specifications.  For  a  library  interface,  Xt  =  {Xlt.Xlf),  to  match  a  query 
interface,  Xq  —  ^Xqf)->  there  must  be  correspondences  both  between  Xft  and  Xqt  and  between  Xlf 

and  Xqf^  These  correspondences  vary  for  exact  and  relaxed  module  match. 


4.1.  Exact  Match 
Definition:  {Exact  Module  Match) 

M-matchE{XL,XQ,  matchjn)  =  3  a  total  function  Uf  '  Xqf  Xlf  such  that 

Uf  is  one-to-one  and  onto,  and 

V  Q  G  Xqf,  match fn{UF{Q)y  Q) 

Uf  maps  each  query  function  specification  Q  to  a  corresponding  library  function  specification,  Uf{Q)^ 
Since  Uf  is  one-to-one  and  onto,  the  number  of  functions  in  the  two  interfaces  must  be  the  same  (i.e., 
\Xlf\  =  The  correspondence  between  each  Q  and  Uf{Q)  is  that  they  satisfy  the  function  match, 

matchfn^  The  match  parameter  {matchjn)  gives  us  a  great  deal  of  flexibility,  allowing  any  of  the  function 
matches  defined  in  Section  3  to  be  used  in  matching  the  individual  function  specifications  in  a  module 
interface. 
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4.2.  Generalized  Match 


Should  a  querier  really  have  to  specify  all  the  functions  provided  in  a  module  in  order  to  find  the  module? 
A  more  reasonable  alternative  is  to  allow  the  querier  to  specify  a  set  of  exactly  the  functions  of  interest  and 
match  a  module  that  is  more  general  in  the  sense  that  its  set  of  functions  may  properly  contain  the  query’s 
set. 

Definition:  [Generalized  Module  Match) 


M-maichgen{^L^^Qyf^<^'l(^^fn)  Is  the  same  as  M-maichE{XL^^Qif^o>tchfn)  except  Uf  need  not  be 
onto. 

Thus  whereas  with  M-maichE{^L)^Q)  matchfn),  \^lf\  —  with  M-matchgen{^L}^Qj'^(^'l<^hfn)j  \^lf\  ^ 

\Iqf\,  and  Ilf  2  ^qf  under  the  appropriate  renamings. 

What  these  definitions  make  clear  in  a  concise  and  precise  manner  is  the  orthogonality  between  function 
match  and  module  match.  In  fact,  the  module  match  definitions  are  completely  independent  of  the  fact  that 
we  are  matching  specifications  at  the  function  level.  If  we  use  the  same  definitions  of  module  matching,  but 
instantiate  match jn  with  a  function  signature  match,  we  have  module  signature  matching  [ZW95]. 


5.  Applications 


As  mentioned  in  Section  1,  any  problem  that  involves  comparing  the  behavior  of  two  software  components 
is  a  potential  candidate  for  specification  matching.  We  examine  three  such  problems:  retrieval  for  reuse, 
substitution  for  subtyping,  and  determining  interoperability. 


5.1.  Retrieval  for  Reuse 

If  we  have  a  library  of  components  with  specifications,  we  can  use  specification  matching  to  retrieve  compo¬ 
nents  from  the  library.  Formally,  we  define  the  retrieval  problem  as  follows: 

Definition:  (Retrieval) 


Retrieve:  Query  Specification,  Match  Predicate,  Component  Library  Set  of  Components 
Retrieve(Qj  matchspec^L)  =  {C  G  L  :  match  spec  (C  j  Q)] 

Given  a  query  specification  Q,  a  specification  match  predicate  match  sped  ^-ud  a  library  of  component  spec¬ 
ifications  L,  Retrieve  returns  the  set  of  components  in  L  that  match  with  Q  under  the  match  predicate 
match  spec  ‘  Note  that  the  components  can  be  either  functions  or  modules,  provided  that  match  spec  is  in¬ 
stantiated  with  the  appropriate  match.  Parameterizing  the  definition  by  matchspec  also  gives  the  user  the 
flexibility  to  choose  the  degree  of  relaxation  in  the  specification  match. 

Using  specification  match  as  part  of  the  retrieval  process  (or  separately  on  a  given  pair  of  components) 
gives  us  assurances  about  how  appropriate  a  component  is  for  reuse.  At  the  function  level  especially,  the 
various  specification  matches  give  us  various  assurances  about  the  behavior  of  a  component  we  would  like 
to  use.  We  treat  Q  as  the  “standard”  we  expect  a  component  to  meet,  and  S  as  the  library  component  we 
would  like  to  reuse.  If  exact  pre/post  match  holds  on  S  and  Q,  we  know  that  S  and  Q  are  behaviorally 
equivalent  under  all  conditions;  using  S  for  Q  should  be  transparent.  If  the  exact  predicate  or  plug-in  match 
holds,  we  know  that  S  can  be  substituted  for  Q  and  the  behavior  specified  by  Q  will  still  hold,  although  we 
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are  not  guaranteed  the  same  behavior  when  Qpre  is  false.  If  the  weak  post  match  holds,  we  know  that  the 
specified  behavior  holds  when  Spre  is  satisfied,  which  we  may  be  able  to  guarantee  given  the  specific  context 
in  which  we  use  that  component. 


5.2.  Subtyping 

Liskov  remarked  in  her  OOPSLA  ’87  keynote  address: 


The  intuitive  idea  of  a  subtype  is  one  whose  objects  provide  all  the  behavior  of  objects  of 
another  type  (the  supertype)  plus  something  extra.  What  is  wanted  here  is  something  like  the 
following  substitution  property  [Lea89]:  If  for  each  object  oi  of  type  S  there  is  an  object  02  of 
type  T  such  that  for  all  programs  P  defined  in  terms  of  T,  the  behavior  of  P  is  unchanged  when 
oi  is  substituted  for  02,  then  5  is  a  subtype  of  T.  [Lis87]. 


Behavioral  notions  of  subtyping  that  attempt  to  capture  this  substitutability  property  have  since  been  de¬ 
fined  by  many,  including  America  [Ame91],  Leavens  and  his  colleagues  [Lea89,  LW90,  DL92],  Meyer  [Mey88], 
and  Liskov  and  Wing  [LW94].  There  are  subtle  differences  between  all  these  subtype  definitions,  but  com¬ 
mon  to  all  is  the  use  of  pre-/post-condition  specifications  (1)  to  describe  the  behavior  of  types  and  (2)  to 
determine  whether  one  type  is  a  subtype  of  another.  Let  mx  be  a  method  of  supertype  T,  and  ms  be  the 
corresponding  method  of  subtype  5.  Then  America,  for  example,  defines  subtype  in  terms  of  the  following 
pre-/post-condition  rules^  for  each  method  of  the  supertype: 


♦  Pre-condition  rule,  mx^pre  ^  ms-pre. 

♦  Post- condition  rule,  ms^post  =>  mx-post 


which  is  just  our  plug-in  match.  Further,  subtyping  requires  that  each  method  in  the  supertype  T  have 
a  corresponding  method  in  the  subtype  5,  but  there  may  be  additional  methods  in  S.  This  corresponds 
exactly  to  our  generalized  module  match.  More  formally, 

Definition:  (America)  Subtype 


Subtype:  Type,  Type  — ^  Bool 

Subtype(^S yT)  —  ]\d-matchgQ'n(^S$peci  Tspecj  rnatchping—in) 

The  definitions  of  subtype  suggested  by  the  other  researchers  can  also  be  cast  in  terms  of  specification  match 
in  a  straightforward  way  where  either  or  both  of  M-matchgen  and  matchpiug-in  is  appropriately  changed. 
In  short,  the  behavioral  notion  of  subtyping  is  just  an  instance  of  our  more  general  notion  of  specification 
match. 


^  We  omit  the  abstraction  function  for  simplicity. 


13 


5.3.  Interoperability 


A  report  on  the  National  Information  Infrastructure  (Nil)  states: 

Interoperability  is  the  ability  to  combine  two  or  more  systems  into  a  single  acceptably  seamless 
and  acceptably  efficient  system  [VLP94]. 

and  argues  that  demand  for  interoperability  of  independently  developed  systems  will  grow  on  an  unprece¬ 
dented  scale,  in  terms  of  sheer  volume,  heterogeneity,  and  complexity  of  individual  systems. 

The  heart  of  an  interoperability  problem  is  that  the  interfaces  of  the  two  or  more  systems  do  not  match. 
Specification  match  is  a  way  of  determining  whether  two  system  interfaces  match  and  hence  whether  the 
systems  can  interoperate.  We  can  also  learn  something  about  components  and  their  relationships  when  a 
match  does  not  occur,  i.e.,  when  there  is  a  mismatch.  It  might  be  possible  to  resolve  mismatches  between 
two  components  if  we  know  why  they  do  not  match,  the  more  typical  scenario  when  trying  to  interoperate 
heterogeneous  components. 

Suppose  we  have  two  components,  C  and  S,  that  agree  to  communicate  using  a  remote  procedure  call 
protocol.  The  client  C  wants  to  use  a  service,  op,  provided  by  S.  To  interoperate  with  S',  C  must  at  least 
match  the  signature  of  op  (passing  in  the  right  number  and  types  of  arguments)  and  its  specification  (e.g., 
establish  op’s  pre-condition). 

Even  if  their  signatures  and  pre-/post-condition  specifications  match,  however,  components  may  still 
not  interoperate.  For  example,  suppose  we  do  not  assume  that  C  and  S  agree  on  which  protocol  to  use 
to  communicate  with  each  other.  If  C  wants  to  communicate  using  non-blocking  send,  but  S  wants  to 
communicate  through  remote  procedure  call  (alternating  blocking  receives  and  sends),  then  a  “protocol 
mismatch”  can  occur.  For  a  protocol  match,  we  might  require  that  each  one  of  C’s  sends  “lines  up”  with 
each  one  of  S’s  receives  and  vice  versa.  However,  using  CSP-like  notation  to  specify  C's  and  S’s  protocols, 
we  have: 

C  =  send  — >  [receive  C\send  C) 

S  —  receive  send  S 

C  might  do  four  sends  in  a  row  and  then  do  a  receive;  meanwhile,  S  deadlocks  after  doing  its  first  receive 
since  it  wants  to  do  a  send  next,  corresponding  to  a  receive  by  C ,  but  conflicting  with  C’s  second  send.  That 
is,  the  following  message  sequences  do  not  match: 

(C)  send  send  send  send  receive 
(S)  receive  send  receive  send  receive 

If  a  protocol  specification  is  included  in  a  component’s  interface  specification  (i.e.,  not  just  signature 
information  and  pre-/post-conditions),  then  we  can  use  a  richer  notion  of  specification  match  to  detect 
this  kind  of  protocol  mismatch.  We  simply  extend  our  notion  of  match  to  include  additional  sub-match 
predicates,  e.g.,  maichprotocoi' 

Definition:  [Interoperates) 

Interoperates:  Component,  Component  Bool 

Interoperates[C,  C")  =  Match[C,  C')  A  matchprotocoi[CprotocohCp^^f^^^i) 
where  Match  is  from  Definition  1  of  Section  1. 
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6.  Implementation 


Each  of  the  examples  given  in  this  paper  have  been  specified  in  Larch/ML,  translated  automatically  to  LP 
input,  and  proven  using  LP. 

For  each  specification  file  (e.g.,  Stack,  sig),  we  check  the  syntax  of  the  specification  and  then  translate 
it  into  a  form  acceptable  to  LP.  Namely,  we  generate  a  corresponding  .Ip  file  (e.g.,  Stack. Ip),  which 
contains  the  appropriate  declarations  of  variables  and  operators  and  assertions  (axioms)  for  the  pre-  and 
post-conditions  of  each  function  specified.  Each  function  foo  generates  two  operators,  fooPre  and  fooPosi] 
the  axioms  for  fooPre  and  fooPost  are  the  body  of  the  requires  and  ensures  clauses  of  foo.  Appendix  B 
shows  Stack .  Ip  and  q2 .  Ip,  the  result  of  translating  the  Stack  specification  from  our  sample  library  and  the 
query  Q2  into  LP  format. 


%%  Plugln-Q2-Stack.lp 
thaw  Stack 
thaw  Q2 

prove  (qEnqPre(s,  e,  s2)  =>  pushPre)  /\  (pushPost(s,  e,  s2)  =>  qEnqPost(s,  e,  s2)) 


Figure  4:  LP  input  for  plug-in  match  of  Stack  push  with  Q2 

We  also  generate  the  appropriate  LP  input  to  show  a  given  match  between  two  functions.  For  example, 
Figure  4  shows  the  LP  input  to  prove  the  plug-in  match  between  the  Stack  push  function  and  query  Q2. 
The  thaw  Stack  command  loads  the  state  resulting  from  executing  the  commands  in  Stack. Ip. 


%%  Gen-Q6-Stack.lp 
thaw  Stack 
thaw  Q6 

prove  (topPre(s,  e)  =>  topPost(s,  e))  =>  (qTopPre(s,  e)  =>  qTopPost(s,  e)) 
%%  additional  user  input 

resume  by  induction 

resume  by  specializing  ss  to  sc 


Figure  5:  LP  input  for  generalized  match  of  Stack  pop  with  Q6 

Since  LP  is  designed  as  a  proof  assistant,  rather  than  an  automatic  theorem  prover,  some  of  the  proofs 
require  user  assistance.  The  example  shown  in  Figure  4  does  not  require  any  assistance  from  the  user. 
Executing  the  statements  in  Figure  4  results  ultimately  in  the  response  from  LP:  []  conjecture,  indicating 
that  LP  successfully  proved  the  match  conjecture.  Generalized  match  of  Stack  pop  with  Q6  requires  some 
assistance  to  tell  the  prover  to  use  induction  in  the  proof,  and  then  how  to  instantiate  the  existential  variables 
(Figure  5).  Figure  6  shows  LP’s  output  script  of  this  proof  execution. 


7.  Related  Work 

Other  work  on  specification  matching  has  focused  on  using  a  particular  match  definition  for  retrieval  of 
software  components  (usually  functions).  Rollins  and  Wing  proposed  the  idea  of  function  specification 
matching  and  implemented  a  prototype  system  in  AProlog  using  plug-in  match  [RW91].  AProlog  does  not 
use  equational  reasoning,  and  so  the  search  may  miss  some  functions  that  match  a  query  but  require  the  use 
of  equational  reasoning  to  determine  that  they  match.  The  VCR  retrieval  system  [FKS94]  uses  plug-in  match 
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*/,y,  exec  M-Gen"Q6-Stack 
thaw  Stack 
thaw  qe 

prove  (topPre(s,  e)  =>  topPost(s,  e))  =>  (qTopPre(s,  e)  =>  qTopPost(s,  e)) 
resume  by  induction 
<>  basis  subgoal 

□  basis  subgoal 

<>  induction  subgoal 
resume  by  specializing  ss  to  sc 
<>  specialization  subgoal 
[]  specialization  subgoal 

□  induction  subgoal 
□  conjecture 

y,*/,  End  of  input  from  file  Vusr/cimy/examples/Gen-Qe-Stack. lp\ 
y.y,  quit 


Figure  6:  LP  output  for  generalized  match  of  Stack  pop  with  Q6 


with  VDM  as  the  specification  language.  The  focus  of  this  work  is  on  efficiency  of  proving  match;  the  tool 
performs  a  series  of  filtering  steps  before  doing  all-out  match  (e.g.,  a  very  relaxed  signature  matching  and 
model  checking).  Perry’s  Inscape  system  [Per89]  is  a  specification-based  software  development  environment. 
Its  Inquire  tool  provides  predicate-based  retrieval  in  Inscape.  Match  is  either  exact  pre/post  or  a  form  of 
generalized  match.  The  prototype  system  has  a  simplified  and  hence  fairly  limited  inference  mechanism. 
Also,  since  specifications  must  already  be  provided  for  software  development  in  Inscape,  the  user  need  not 
write  a  separate  query  specification.  Jeng  and  Cheng  [JC92]  use  order-sorted  predicate  logic  specifications. 
Their  match  is  similar  to  our  generalized  function  match,  but  has  the  additional  property  that  it  generates 
a  series  of  substitutions  to  apply  to  the  library  component  to  reuse  in  the  desired  context.  Mili,  Mili  and 
Mittermeir  [MMM94]  define  a  specification  as  a  binary  relation.  Specification  match  is  based  on  the  refines 
ordering  on  relations,  somewhat  like  our  generalized  match.  The  PARIS  system  [KRT87]  maintains  a  library 
of  partially  interpreted  schemas.  Each  schema  includes  a  specification  of  restrictions  on  input  to  the  schema, 
assertions  about  how  the  abstract  parts  of  the  schema  can  be  instantiated,  and  assertions  about  the  results 
of  the  schema.  Matching  corresponds  to  determining  whether  a  partial  library  schema  could  be  instantiated 
to  satisfy  a  query.  The  system  does  some  reasoning  about  the  schemas  but  with  a  limited  logic.  Katoh, 
Yoshida  and  Sugimoto  [KYS85]  use  English-like  specifications  and  queries  that  are  translated  into  first- 
order  predicate  logic  formulas.  They  use  “ordered  linear  resolution”  to  determine  matching  between  a  query 
and  specification,  and  include  relaxations  for  changing  the  order  of  parameters,  making  some  parameters 
constants,  or  renaming  subroutines.  However,  the  match  does  not  verify  that  the  subroutines  match  and 
checks  only  for  equivalence,  not  permitting  any  inference. 

To  summarize,  our  work  on  specification  matching  is  more  general  than  the  above  in  three  ways:  We 
handle  not  just  function  match,  but  module  match;  we  have  a  framework,  which  is  extremely  modular  (e.g., 
function  match  is  a  parameter  to  module  match;  specification  match  is  one  conjunct  of  component  match), 
within  which  we  can  express  each  of  the  specific  matches  “hardwired”  in  the  definitions  used  by  others;  and 
(3)  we  have  flexible  prototype  tool  that  lets  us  easily  experiment  with  all  the  different  matches.  Finally, 
we  are  not  wedded  to  just  the  application  of  software  retrieval;  we  see  the  need  to  understand  specification 
match  as  it  relates  to  other  application  areas. 

Signature  matching  can  be  viewed  as  a  very  restricted  form  of  specification  matching.  Work  in  this  area 
has  focused  on  taking  advantage  of  the  expressiveness  and  theoretical  properties  of  type  systems  to  define 
various  forms  of  relaxed  matches  [ZW95,  DC92,  Rit92,  RT89,  SC94]. 

Less  closely  related  work,  but  relevant  to  our  context  of  software  library  retrieval,  divides  into  two 
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categories:  text-based  information  retrieval  [AS86,  PD89,  MBK91]  and  Al-based  semantic  net  classifica¬ 
tions  [OHPDB92,  FHR91].  The  advantage  to  these  approaches  is  that  many  efficient  tools  are  available  to 
do  the  search  and  match  in  these  structures.  The  disadvantage  is  that  the  characterization  of  the  component’s 
behavior  is  completely  informal. 


8.  Summary 


This  paper  makes  three  specific  contributions  with  respect  to  specification  matching;  foundational  defini¬ 
tions,  descriptions  of  applications,  and  a  report  on  a  prototype  tool. 

By  providing  precise  definitions,  this  paper  lays  the  groundwork  for  understanding  when  two  different 
software  components  are  related,  in  particular  when  their  specifications  match.  Though  we  consider  in 
detail  functions  and  modules,  exact  and  relaxed  match,  and  formal  pre-/post-condition  specifications,  the 
general  idea  behind  specification  matching  is  to  exploit  as  much  information  associated  with  the  description 
of  software  components  as  possible. 

Though  our  notion  of  specification  match  was  originally  motivated  by  the  software  library  retrieval  ap¬ 
plication,  it  is  more  generally  applicable  to  other  areas  of  software  engineering,  for  example,  determining 
subtyping  in  designing  class  hierarchies,  or  detecting  an  interoperability  problem  in  a  heterogeneous  dis¬ 
tributed  system. 

Finally,  by  building  a  working  specification  match  engine,  we  demonstrated  the  feasibility  of  our  ideas. 
By  providing  the  community  with  a  tool,  we  are  now  in  the  position  to  explore  their  pragmatic  implications. 
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A  The  Sequence  Trait 


The  Sequence  trait  defines  operators  to  generate  sequences  {empty  and  insert),  to  return  the  element  or 
sequence  resulting  from  deleting  an  element  from  the  beginning  (or  end)  {first  {last)  and  butFirst  {butLast)), 
and  to  return  the  length  of  a  sequence  {length)  or  whether  a  sequence  is  empty  {isEmpty). 


Sequence{E,  S)  :  trait 
includes  Integer 
introduces 

empty  S 
insert  :  E,S  S 
first  :  S  E 
last  :  S  ^  E 
butFirst  :  S  S 
butLast  :  S  ^  S 
isEmpty  :  S  Bool 
length  :  S  — ^  Int 

asserts 

S  generated  by  empty,  insert 
S  partitioned  by  isEmpty,  length 
^  e:E,s:S 

first{insert{e,  s))  ==  e 
butFirst{insert{e,  s))  =—  s 

last  {ins  ert  {e,  s))  if  s  =  empty  then  e  else  last{s) 

butLast {insert{e,  s))  ==  if  s  =  empty  then  empty 
else  insert{e,  butLast{s)) 
isEmpty{empty) 

-tisEmpty {insert {e,  s)) 
length{empty)  ==  0 
length{inseri{e,  s))  ==  length{s)  +  1 
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B  LP  Input 


Stack. Ip  and  Q2.1p  contain  the  result  of  translating  Stack  and  Q2  into  LP  input. 


%%  Stack.lp 

execute  Sequence -Axioms 
set  name  Stack 
declare  var 
e:  E 
s:  C 
s2:  C 


%%  Q2.1p 

execute  Sequence  jVxioms 
set  name  Q2 
declare  var 
e:  E 
ql:  C 
q2:  C 


declare  op 

createPre:  “>Bool 
createPost:  C  — >Bool 
pushPre:  — >Bool 
pushPost:  C,  E,  C  — >Bool 
popPre:  C,  C  — >Bool 
popPost:  G,  C  — >Bool 
topPre:  C,  E  — >Bool 
topPost:  C,  E  “>Bool 


declare  op 

qEnqPre:  C,  E,  C  — >Bool 
qEnqPost:  C,  E,  C  — >Bool 


assert 

qEnqPre(ql,  e,  q2)  =  (length(ql)  <  50); 
qEnqPost(ql,  e,  q2)  =  (length(q2)  =  length(ql)  -f  1) 


assert 

createPre  =  true; 
createPost (s)  (s  =  empty); 
pushPre  true; 

pushPost(s,  e,  s2)  =  (s2  —  insert(e,s)); 
popPre(s,  s2)  ==  (^(isEmpty(s))); 
popPost(s,  s2)  =  (s2  =  butFirst(s)); 
topPre(s,  e)  =  (<^(isEmpty(s))); 
topPost(s,  e)  =  (e  =  first(s)) 
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